Index Structures for Information Filtering Under the Vector Space Model

نویسندگان

Tak W. Yan

Hector Garcia-Molina

چکیده

Under the Vector Space Model Tak W. Yan and Hector Garcia-Molina Department of Computer Science Stanford University Stanford, CA 94305 Abstract With the ever increasing volumes of electronic information generation, users of information systems are facing an information overload. It is desirable to support information ltering as a complement to traditional retrieval mechanism. The number of users, and thus pro les (representing users' long-term interests), handled by an information ltering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The e ciency of the ltering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to e ciently perform large-scale information ltering under the vector space model, a retrieval model established as being e ective. We apply the idea of the standard inverted index to index user pro les. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a pro le, select only the signi cant ones to index. We evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. We also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

متن کامل

Empirical Mode Decomposition based Adaptive Filtering for Orthogonal Frequency Division Multiplexing Channel Estimation

This paper presents an empirical mode decomposition (EMD) based adaptive filter (AF) for channel estimation in OFDM system. In this method, length of channel impulse response (CIR) is first approximated using Akaike information criterion (AIC). Then, CIR is estimated using adaptive filter with EMD decomposed IMF of the received OFDM symbol. The correlation and kurtosis measures are used to sel...

متن کامل

A Stock Market Filtering Model Based on Minimum Spanning Tree in Financial Networks

There have been several efforts in the literature to extract as much information as possible from the financial networks. Most of the research has been concerned about the hierarchical structures, clustering, topology and also the behavior of the market network; but not a notable work on the network filtration exists. This paper proposes a stock market filtering model using the correlation - ba...

متن کامل

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Index Structures for Information Filtering Under the Vector Space Model

نویسندگان

چکیده

منابع مشابه

Improved Skips for Faster Postings List Intersection

Improved Skips for Faster Postings List Intersection

Empirical Mode Decomposition based Adaptive Filtering for Orthogonal Frequency Division Multiplexing Channel Estimation

A Stock Market Filtering Model Based on Minimum Spanning Tree in Financial Networks

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

عنوان ژورنال:

اشتراک گذاری